Search CORE

21 research outputs found

Algorithms and data structures for grammar-compressed strings

Author: Cording Patrick Hagge
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Boxed Permutation Pattern Matching

Author: Amit Mika
Bille Philip
Hagge Cording Patrick
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Given permutations T and P of length n and m, respectively, the Permutation Pattern Matching problem asks to find all m-length subsequences of T that are order-isomorphic to P. This problem has a wide range of applications but is known to be NP-hard. In this paper, we study the special case, where the goal is to only find the boxed subsequences of T that are order-isomorphic to P. This problem was introduced by Bruner and Lackner who showed that it can be solved in O(n^3) time. Cho et al. [CPM 2015] gave an O(n^2m) time algorithm and improved it to O(n^2 log m). In this paper we present a solution that uses only O(n^2) time. In general, there are instances where the output size is Omega(n^2) and hence our bound is optimal. To achieve our results, we introduce several new ideas including a novel reduction to 2D offline dominance counting. Our algorithm is surprisingly simple and straightforward to implement

Dagstuhl Research Online Publication Server

Lempel-Ziv Compression in a Sliding Window

Author: Bille Philip
Cording Patrick Hagge
Fischer Johannes
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

We present new algorithms for the sliding window Lempel-Ziv (LZ77) problem and the approximate rightmost LZ77 parsing problem. Our main result is a new and surprisingly simple algorithm that computes the sliding window LZ77 parse in O(w) space and either O(n) expected time or O(n log log w+z log log s) deterministic time. Here, w is the window size, n is the size of the input string, z is the number of phrases in the parse, and s is the size of the alphabet. This matches the space and time bounds of previous results while removing constant size restrictions on the alphabet size. To achieve our result, we combine a simple modification and augmentation of the suffix tree with periodicity properties of sliding windows. We also apply this new technique to obtain an algorithm for the approximate rightmost LZ77 problem that uses O(n(log z + log log n)) time and O(n) space and produces a (1+e)-approximation of the rightmost parsing (any constant e>0). While this does not improve the best known time-space trade-offs for exact rightmost parsing, our algorithm is significantly simpler and exposes a direct connection between sliding window parsing and the approximate rightmost matching problem

Dagstuhl Research Online Publication Server

Finger Search in Grammar-Compressed Strings

Author: Bille Philip
Christiansen Anders Roy
Cording Patrick Hagge
Gørtz Inge Li
Publication venue
Publication date: 01/01/2016
Field of study

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index

f

, called the \emph{finger}, and the query index

i

. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let

n

be the size the grammar, and let

N

be the size of the string. For the static variant we give a linear space representation that supports placing the finger in

O(\log N)

time and subsequently accessing in

O(\log D)

time, where

D

is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in

O(\log N)

time and accessing and moving the finger in

O(\log D + \log \log N)

time. Compared to the best linear space solution to random access, we improve a

O(\log N)

query bound to

O(\log D)

for the static variant and to

O(\log D + \log \log N)

for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Skjoldjensen Frederik Rye
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Symposium on Algorithms and Computation (ISAAC 2016)
Publication date: 01/01/2016
Field of study

Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

Dagstuhl Research Online Publication Server

Lempel-Ziv Compression in a Sliding Window

Author: Bille Philip
Cording Patrick Hagge
Fischer Johannes
Gørtz Inge Li
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Skjoldjensen Frederik Rye
Vildhøj Hjalte Wedel
Vind Søren
Publication venue
Publication date: 01/01/2016
Field of study

Given a static reference string

R

and a source string

S

, a relative compression of

S

with respect to

R

is an encoding of

S

as a sequence of references to substrings of

R

. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string

S

is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

arXiv.org e-Print Archive

Online Research Database In Technology

Fingerprints in compressed strings

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Sach Benjamin
Vildhøj Hjalte Wedel
Vind Søren
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

Abstract. The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i, j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log logN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logN log `) and O(log ` log log `+ log logN) for SLPs and Linear SLPs, respectively. Here, ` denotes the length of the LCE.

CiteSeerX

Online Research Database In Technology

Explore Bristol Research

Finger Search in Grammar-Compressed Strings

Author: Bille Philip
Christiansen Anders Roy
Cording Patrick Hagge
Gørtz Inge Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Online Research Database In Technology